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From eKperimental work performed, and reported upon 
in this document^- it is concluded that converting the New York State 
Library (NYSL) shelf list sample to machine readable form, and 
searching this shelf list using a remote access catalog are 
technically sound concepts though the capital costs of data 
conversion and systein installation will be substantial. The two 
primary areas of investigation covered in this report are: (1) pilot 
conversion to machine readable form of a portion of the NYSL shelf 
list; the purpose of this conversion process itself being the 
creation of a file of machine readable records which can be searched 
by a computer under the control of a telecomniunication computer 
terminal. The purpose of the pilot conversion test is to determine 
costs of conversion, and any unusual technical problems s and (2) 
eKperimenta lion with, and use of, the initial product of the pilot 
conversion in catalog searching. The purpose of the search test is to 
determine technical feasibility of the search process where a user 
must formulate a query as a logical combination of alphabetic search 
words, a process far different than the mental eye--brain scanning of 
entries on catalog cards* (Author/Sj) 
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1* iNTRODUCTIOM 
This report describes the resulta achieved so far In an 
experimental program In machine-assisted bibliographic control 
undertaken by the New York State Library , This program was iindertak 
for several reasons. First ^ the new library building on the 
South Mall will present cat aloe access problems^ with Reader 
and Technloal Services separated at such distances that the one 
public oard catalog will be Inadequate. Second^ it seems likely ^ 
but has not yet been proven^ that a single computer record may 
substitute for a great number of present files ■ efforts now helng 
spent on file maintenance and searches may be reduced with automation 
Thirds the long-range purpose of statev/lde information network 
development will be aided by the construction of a NYSL bibliographic 
data bank available for remote external access * 

Investigations were undertaken to provide a basis for well- 
informed decisions on the advisability of a computer-^asslsted 
catalog and on the most desirable forms of storage 3 access ^ and 
dlaplay of Information, 

The two primary areas of Investigation covered in this report 

are : 

1. A pilot conversion to machine-readable form of a portion 
of the NYSL shelf llst| the purpose of this conversion 
process itself being the creation of a file of machine 
readable records v/hlch can be searohed by a computer under 



the control of a telecommunication computer terminal. The 
purpose of the pilot conversion test Is to determine costs 
of conversion J and any unusual technical problams. 
2. Experimentation with, and use of, the Initial product of the 
pilot converalon in catalog aearchlng. The purpose of the 
searcli test is to determine technical feaBibillty of the 
aearch process where a user must formulate a query as a 
logical combination of alphabetic search words, a process far 
different than the mental eye-brain scanning of entries on 
catalog cards. 

The use of actual shelf list data In the experiment provided 
a real environment for the conversion and therefore experimental 
costs of conversion can be projected so that the cost of converting 
the entire Job can be computed. 

The age of the shelf list results In several methods having 
been used in its ■ construction so about 350 of the shelfllst 
records are Incomplete. In addition they are so messy in appearance 
(fields crossed out or wrltten-ln what seems to be an indlscrlminant 
fashion) as to bp confusinB to the personnel involved In the 
tagging, editing and keyboardlns. » The only attributes the shelf- 
list file might be considered to have with respect to a massive 
retrospeotive conversion effort are: (a) I'j la organized lOGlcally 
by 3UbJect area and, (b) the rooords needed for conversion are 
easily separated from the file. 



* The effort to complete the shelf-Ust reoords was deemel too 
expensive. 
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Assumlns the experimental sample is representative of the 
entire list, the shelfllst consists of the follav;lng types of 
records.' L.C. cards (^16^), 0:rder slip- '19%), mSL original 
cataloginK (9%), Serial Line Cards (10^), and various other types, 
i.e., Short Original Cataloging on hand written or typed cards (l6g). 

During the course of the experimentation, a consulting group 
from Inforonlcs, Inc., a firm with a substantial history in 
library automation, was engaged to reviev/ progress. 
1« 1 Study Approach 

The approach taken by Inforonlcs in this review was to 
examine the ongoing IJYSL experiments directed at key questions, 
namely cost of Input and search utility. For the conversion study 
the results of the pilot test were analyzed and costs calQulated 
for each function of conversion. In the search investigation, 
Inforonlcs helped design the search experiments and the procedure 
for the documentation as well as analyze the data obtained in 
the experiment riearches run by the N¥SL staff. 

The use cf microfilm to duplicate and distribute copies 
of the WYSL ■catalog as a Remote Access Catalog was not Included 
in the experimentation. In any future plans however, it ramalns 
an alternative which should be considered. 

The primary end product of the investigation so far has been 
an estimation of (1) the costs of oonverslon, and (2) the accuracy 
of searching when compared to manual card searching. However 
observation of the experimental apparatus and procedures yielded 
a great amount of qualitative Informationj which should be useful 
ErJc" longer range planning of mSL aufcornatlve activities. 



This report, in addlfcion to descrlblns Inforonlcs' work 
also contains the results of the ilYSL In-house project staff 
work, and 5 ould be considered a final report of the entire 
convei'3ion and searohlng project. 
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2. COHVERSIOM EXPERirENT 

2 . 1 Background 

Estimating the costs and technical problems involved In 
converting the IIYSL shelf list to raachlne readable form for use 
in a' Remote Access CatalOB is so complex that the observations 
and measurements of an actual production teat environment is 
needed. To satisfy this need, the fJYSL staff carried out a 
pilot project to convert a segment of the NYSL shelf Hat, The 
project had three main components? an in-house data preparation 
effort, a contracted data tagging and keyboardlng effort, and 
an in-house EDP file validation and conversion effort, the pi->oJect 
was begun in the .fall of 1969 and had progressed to a stage where 
proposals for tagging and keyboardlng could be solicited in Novembe 
A contract was awarded shortly afterward. The bulk of the work on 
this project was carried on during the year 1970, Its end product 
was a file which ivas to be used as a test file for subsequent 
experiments In searching a Remote Access Catalog. 
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2 , 2 File Conver elon Ex periment De s 1 gn_ 

The procedure for converting the shelf list to machlnG 
readable form was developed to adhei'^e to several bap 1 n QV4>uri ^n^nife ni 
design policies. 

1. The use of Dev/ey Class 550-599 as an experimental sarnple. 
This sample was considered representative of the total 
shelf llst^ and was common enough In topic to allow good 
search experimentation, ''^ 

2. The use of tv/o types of flleo of machine readable data 
elements In the converted file one a fully coded MARC II 
and the second a modified HARC II uontalnlng an abbreviated 
list of elements. 

These two types of files would allov; experiments yielding 
possible cost reduction of coding a modified HARC II reaordp 
If so^ then experiments were needed to see hovi much Its 
search capability would be curtailed v/hen compared to a 
full MARC II record. 

3. The use of three types of staff for manuscript preparation^ 
tagglnSi. ^d editing tasks: clerical^ semi-professional^ and 
professional* The skills required in converting were 
relatively unknown and allowing for the use of all t5^pes of 
personnel would yield data useful in matching conversion 
tasks to the skill levels of different library personnel. 



In addition the subject areas chosen for this oonverslon were 
to be of use to the Science and Technology Section which did not 
have a catalog of its own parfclcular collection, 

EKLC 
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2.3 Experi ment Design Task 

The following design tasks were carried out in preparation 
for the pilot test conversion: 
2.3.1 Encoding worksheet design 
; A worksheet was designed on which a copy of the shelf list 

card could be affixed. The worksheet contained spaces for 
MARC tags and other cataioglns Information needed In the record. 
A copy of a compieted worksheet is shown in Figure 1. 
Microfilming shelf list samt4e_ 
The conventional method of duplicating library card files by 
microfilming and Xerox Copyflo enlarging was found to be the lowest 
; , cost and least disraptlve method to the library procedures. The 
r Xerox enlargements were to be subsequently affixed to the 

v/orksheets. 

i 2,3.3 TagglnR manual preparation 

^ A tagging manual was developed by extracting pagee from 

i the L.C. MARC Manual which contained the information which seemed 

; to be most usable. 

There were two types of shelf list data entries: complete 
I category consisting of LC or W cards, called Category 1 and 

incomplete cataloging consisting ■ of serials cards, order cards, 
and miscellaneous Incomplete cards called Category 2. Each of 
these samples (Category 1 and Category 2) were encoded in two 
ways.. Full MARC encodlngW called Task I (to be done by the vendor) 
and Modified MARC called Task II to be done by the NYSL tagging 
staff. 

^S\S^boL%n^i^^'^ misnomer because Pull mm can only be 
^ fSm ^1:1,^°°^ ^5 catalog cards. Pull maRC encoded 

ERlC^Mch ff f ^" occasional fixed field however 

"^version! ^^^'^^"^^ ^"^ experiment or in any planned coJi 
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Thus there were four types of records possible in the 
experiment, each with Its ovm range of tass dependinc on the 
extent of the cataloging available. 

2.3.^. TagglnK staff tralnlnK 

■ A tagging and editing staff was formed, to handle the coding 
of the source documents (3" x 5" paper slip copiea of the 
shelfllst records stapled to the coding sheets), of six part-time 
library science students from the State University of Nev/ York 
at Albany and four full-tine clerical employees, Each person 
received approximately 1-1/2 days of training prior to their 
tagging the source doouments. This training period consisted of 
practice tagglns of sample L.C. catalor.lng records which v;ere 
specially chosen to Illustrate most imnc variable and fixed 
fields and as many variations of these fields that mlEht posBlbly 
occur. 

2.4 Con ye r s 1 on P r o ce dur e_ 

The following steps were used In the conversion process: 

1. The xerox copies of tiie shelf list were affixed to 
the work sheets.. 

2. The worksheets were separated into groups by the NYSL 
project staff. Task I documents contained those 
worksheets to be both tagged and encoded by the vendor. 
Task II documents contained the worksheets to be tagged by 
the NYSL experimental project staff. 

O 

ERIC 
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3. The Task I workaheeta were sent to the vendor. 

^. The Task II worksheets were sent to the experimental 

project staff (library school students, MYSL clerical, 
and profesalonal staff). The worksheets were tagged, 
edited, and sent to the vendor for keyboardlng. 
5. The vendor tasged the Task I worksheets and then 
•encoded both task I and II worksheets by the following 
procedure : 

a. The tagEed worksheets were transcribed by typing 
on an OCR typewriter. 

b. The typed vrorksheets were read on an OCR scanner 
creating a magnetic tape of typed line Imagoa. 

o. The OCR output was run thru a print out 

prosram, which contained a simple validator, producing 

a listing with error messages. 

d. The listing was proofread and marked for editing. 

e. Typed lines containing errors were retyped and 
merged with the orlslnal file, replacing the inoorrect 
lines. 

f. The corrected file \ms processed to arrange it 
in class order and to convert it to (1) a MARC II 
input format and (2) a BCD listing tape, 

g. The BCD tape was listed, and the list and the MARC 
tape were delivered to N.Y.S.L, 



I 
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6. The tape delivered to MSL was translated to the 
Control Data character set and an IJYSL Internal 
format . 

7. The translated tape was verified to assure that the 
file conformed to the NYSL version of MARC. Invalid 
records were deleted to. be re-input by the vendor, 
reprocessed through the NYSL validation system. 

8. The NYSL mm tape was converted to the form 
which SUNY Biooomraunlcatlon Network computer staff could 
enter into its system. 
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2.5 Problems with the Experimental Operation 

Many problems occurred In the conversion process which 
had their root In the (1) lack of time to properly plan experlmenta j 
caused because of pressure to make fiscal expenditure committments 
'and (2) Misrepresentation of capability on the part of the tagging 
and keyboardlng vendor. The net result of these problems was 
(1) a delay In schedule and (2) an obscuring of thf measurements 
of parameters from which cost and production estimates could be 
made . 

2.5.1. N.Y.S.L. Pro.ieet Control Problems 

Delays and reprocessing were caused by^lnadequate document 
control procedures. Although the pilot operation was experimental 
and covered only a small fraction of the NYSL shelf list, the 
actual numbers of documents, (approx. 20,000), batches (800), 
.and number of processing steps (approx, 10) v;ere large enough 
•to require strlot controls. Microfilm was not Inspected properly, 
supplies (worksheets) ran out, inadequate backlog of work (due 
to delays In microfilming deliveries) and lost batches of documents, 
all contributed to excessive time spent in expediting, reprocessing, 
and rescheduling. 

As well as possible, time spent In these ac'-.lvltles was removed 
from the production time measured, but It is possible that some 
nonproductive time was not accounted for which would affect the 
accuracy of the data collected on labor time required to tag, edit, 
and correci; entries. . 

2.5.2 Vendor ProcesBlng CQntrol Problems 

The vendor had inadequate file control, error detection, and 
ganuscrlpt control procedures. Detecting errors, finding original 
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manuscript to be reaubmltted, and genenal follow-up v/as left 
to the ijySL staff, contributing much to their administrative 
workload. This effort was considerable because the Dewey vendor 
did not-supply the specified printout in sequenoe so locating 
records for checking was eixoeedlngly difficult. 

Plnally the errors found by coriiputer validation were 
found so late in the project that the use of the errors to 
correct tagging and editing procedures was impossible. The 
taggers themselves did not have the benefit of learning 
from these mistakes. 

2.5.3 ProblemB with non-LC catalosi ng procedures 

Some entries in the shelf list had items which \rere difficult 

to fit into the MARC II data Item set. This Is a real problem 

however, and would occur even In a properly designed production 

system. Further study la required to determine whether these entries 

will require being revised or recataloged, 

2.5.^ Inadeguate data base analysis 

The lack of time for planning caused several hasty decisions 

on the specifications of the Task I and Task II data bases, causing 
problems in the resultant encoded data. The assignment of the 
082 (LC DDC's number) tag to the HYSl Dewey number caused ambiguity 
because its structure la different from the LC suggested Dewey 
nnmber. The 1490 CMARC) tag was used without Its Indicator, which 
caused the field to be meanlnglesB, Plnally the holdings statement 
field 901 (local data) was improperly designed so that computer 
analysis of Its contents would be exceedingly difficult. This 
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last problem did not affect the production process^ houever^ 
because its implications were In future use of the data for 
circulation control . 
2.6 Experimental Results 

The results obtained In the Conversion experlnient consist 
of a determination of a conversion cost per record^ some 
qualitative Judgment about possible cost reduction, and an error 
analysis of the encoded records. 

2*6.1 Conversion cost per rec ord of Task I records full riARC 

The cost per record of the full MARC record was $1 *65/record. 
This value was computed from the total Task I vendor quotation of 
$38^^*00 eliminating the setup and programmins costs of $219^1,00* 
2.6,2 Conversion cost per record for Task II records modified 

MARC 

The cost required by the conversion process is estimated 
to be $1.7^/record, broken down into labor ($l.k9)^ and computer 
material and services costs ($,27). A breal:down of these costs 
is shown in Table I* 

In caXculatlons to make the vendor costs and NYSL costs 
comparable^ NYSL direct labor costs have been burdened with 100% 
overhead for supervision, payroll benefits, facilities, and 
technical support .^^ 

^ The president of the vendor company told us for a Job the size 
of the NYSL experiment (20,000 records) that the cost would be 
apportioned 25^ keyboarding, 12% verification, 15^ scanning, 
computer editing and' conversion, and kB% overhead and fees. The 
overhead and fee of W of the total Is approximately 130^ of 
the direct labor costs. We estimate the non-fee cost (overhead) 
to be approximately 100^, This aBsumptlon places the fee at 
11$, which iB reasonable. 
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^•^•3- Cost of c onversion of the IIYSL Shelf List 

The conversion of the entire shelf list would;. require 12 8,000 
man hours and $1,113,600 in total funds. A project of this 
magnitude carried out over a perloa of H years would require 18 
full time staff. Carried out over 2 years it would require 36 
full time staff. 

PosslblUtlea for r.nn t reduction 
The cost of conversion estimated Is a realistic practical 
figure and will not decrease with Increase in volume of titles 
processed or with minor technical Improvement of the system. The 
possibilities for further reduction lie in three areas] the use 
of format recognition, the availability of additional RECON records 
from the Library of Consress, and the use of MARC records produced 
by other libraries, 
2,6.^.1 Pormat recognition 

The use of format recosnltlon will probably reduce the cost 
of conversion slightly. Typing costs are higher because it Is a 
more difficult task. Some of the costs saved in tagging are 
expended in additional editing. Potential cost savings are not 
available from any published Bource. and in our opinion will not 
exceed 10$. Data will be forthcoming from the Library of Congress 
shortly aornpaHng th.ir ooots of tagging vs. format recognition. , 



ERIC 



Table I 



Task • I 

Vendor Cost c-o on/^^^ , 

•p3. oO/record 



Task II 

Function Direct Cost pverhead 



NYSL tagging labor ,26 .26 

NYSL revision labor .087 .087 



Xerox copy ^0^^ 
.13 

Total Cost 



Total 



.52 
.17 

Vendor typing labor .27 .27 .54 

Vendor verifylnG labor 

(proofreading) ,13 .13 .26 

$1.49 

Vendor Computer ^6 

WYSL material & services 

Ilanuacrlpt .01 

Microfilm & 

enlargement .08 



l13 



$1.78 
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2.6.4.2 Library of Consress REprosectlve CONversion (RECON) 

The use of RECON tapes from the Library of Congress will 
involve only computer expense which la 101 of the,^. total, saving 
901 of the total cost (all labor). Records can be converted at 
a cost of $.16. The RECON tapes presently cover English language 
Imprints back to 1958, so it Is reasonable to assume that 
100,000 titles are already available in encoded form. 

2.6.4.3 Other libraries machine records 

A promising area of cost reduction is the use of encoded 
records of other libraries. Tliese records are being encoded by ' 
several groups in large quantities, and as time progresses, the 
encoding formats are progressively closer and closer to being MARC 
Identical. At present there are 2.5 million records to be encoded 
which probably would be useful. There vrould be additional 
computer programming and operating costs associated with their 
conversion 5 which we estimate to be $.10/record based on the use 
of 500,000 records. The computer conversion cost of such records 
would be approximately $.26. In addition no data is available 
which allows estimation of the percentase of the NYSL shelf list 
contained in these available encoded MARC files. 
2.6.5 Con v ers i on ^cjds t_s__^ at other libraries 

A telephone survey of other conversion projects was made, and 
the following Qosts were obtained. These costs are not exactly 
comparable to the experimental costs because (1) different methods 
of accounting are used for overhead and computer costs and (2) 
thei-e are variations In the accuracy of the final product. 
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i'fYSL Pull iIAnC Task I 



$1.65/recor'd 



NYSL Modified MARC Task 



1. 78/record 



Library of CongreBS 



2.g6/record (exclusive of computer cost. 



and Commercial Vendor 



2 . 60/record 



2.6,6, . Analysis of errors 

One of the significant results of comparing the Remote Access 
Catalog Conversion Project with other conversion efforts Is the 
relationship of encoding cost to percentage errors In the final 
product. The data obtained were not accurate enough to compute 
quantitative relationships of cost to error percentages ■ hov/ever, 
it was possible to compare the NYSL experiment with error data from 
a second conmiercial vendor. 

The tagging and typographic errors contained In the keyboarded 
copy vfe have separated Into two types, defined as follows: 

Logical errors - those errors which can be detected but 
not corrected by a computer program of moderate 
complexity, but without extensive dictionaries. 
Spelling errors - errors In Bpelllng of any string of 
characters in an item Including apaolng and punctuation. 

The errors at suocesslve stages of the two input processes . 
are compared, expressed as a percentage . of MARC II records In 
error. Some data are not available. 
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NYSL Experiment ($1 . 78/record) 2nd Commercial Vendor ($2.60/record. 

, » logical^ S pelllnff . logical SpellinK 

at keyboardlng unknown unknov^n at keyboarding ~ 30% 

. after 1st edit 14.1^5 iH% after 1st proof- 

reading & edit .1% 1% 

after NYSL • after 2nd proof- . 

error analysis reading, checking 

and vendor and edit .0% 02% 

re-edit 1,1% '^'^ ■ 

We think this comparison is a useful one for It points out 
that the difference between a very good file and a file with 
considerable typographic errors is one additional high quality 
proofreading and edltins pass which contributes approximately 
501 increase in cost. 

2.6.7 Effect of error on file usage 

Althougli the error rate in the NYSL final- produce is quite 
high, spelling errors occurring in m of . the records encoded, only 
about 2% of these errors could cause errors in the remote aGcess 
catalog searching experlments« . These seriouB errors were contained 
in words in the elements potentially useful as search elements, 
such as short title. The bulk of the nonserlous errors were 
punctuation, spacing or errors in fields not likely to be searched. 

Although the file error rate might be acceptable for machine 
search purposes, in the use of a file In technical processing or 
the production of printed catalogs, W error would be above that 
acceptable by cataloging tradition. The only severe shortcoming 
Is really an esthetic one. 



* This ivlll be discussed In greater detail in Chapter 3 on Searching. 
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A pov/erful concept which can be easily applied to a 
machine system la the correction of a catalog file based on 
reportB of errors from users* In a manner similar to the way 
In which L*C, corrects ana reprints Its cards upon notification 
from users 3 NYSL could correct its machine file. As long as the 
file Is accurate enough to be acceptable for use (which th^ 
experimental NlfSL file Is, w© think) it can be edited every^ time 
a user spots an error. 

One can carry this line of reasoning through to the 
conoluslon that all systems no matter how accurate accept user 
input for error correction* Speaking loosely in a mathematical 
sense^ it probably costs an Infinite Miount to create a large 
file with zero error. As a corollary , each succeeding error 
found costs more to find than Its predecessor. It seems 
practical therefore to let the users find them at some point, for 
their effort costs nothing as It is a byproduct of their 
Bearchlng activities. 



\ 



20. 



3, ON LINE SEARCH EXPERIMEWTS ^ 
The two concepts underlying the Remote Access Catalog 
are "Rernote" which means that access can be done at places remote 
In and outside the llhrary and "acceBs" which Implies that a 
searching capability Is available In the system. The problem 
of "remoteness" is not a difficult one and many successful computer 
systems exist which operate at a distance from their users connected 
by tele-communl cations , 

The problem of search is not simple however and there are 
few systems in operation on any large scale and none which can 
perfoi-'m In any demonstrable way what the proposed IJYSL Remote 
Access Catalog Is supposed to do. 

To Investigate the unknowns in the proposed concept an 
experimental program was carried out which allowed project 
personnel to search the data base converted from the 550-599 
sections of the NYSL Catalog by a variety of aGcess points. This 
search experimentation could be evaluated qualitatively from a 
users point of view and also could be ■ compared to manual card 
catalog searching. 
3.1 Experiment al pe sign Follcy 

The basic policy decision In the design of the search 
pxpoi'lments was to use the Upstate Medical Center Bloriedlcal 
Communleatlon Network (BCM) search system. This was a laudable 
decision because much can be learned from this system without ■ 
any prosrammlns cost. Any possible disadvantages nf not being 
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j able to do exactly v/hat one would like to are far outweighed by 

the cost saving estimated to be in the hundreds of thousands of 
i dollars. Any answers to questions one can't develop experimentally 

: may be arrived at analytically given measurecl values of experi- 

i 

ments performed in those areas where one can. 

The Biomedical Communication Network is a group of libraries 
connected to a search center by telecommunication lines. The search 
i center located (at the time of the experiment) at the Upstate 

j Medical Center In Syracuse, New Yor^, accepts all search requests 

and displays results via IBri 27it0 typev/rlter terminals. The 
j center's computer can also serve as a communication sv/itchlng mode 

which, allows one library to communicate with another. 

The use of the Upstate Medical Center search system, which 
I is an on-line version of the IBM DOC PROC System, as an experimental 

tool gives one the following capability, A machine file is 
1 created where words representlns authors (or more generally personal 

and corporate names), titles, subjects, and the MARC fixed fields are 

i 

i stored in a computer memory. These data elements can be searched 

I In isolation or combined In a logical and, and/not, or, 

and/or combinations. Additionally a list of stop words is provided 

; so that one does not need to concern himself with problems of 

initial articles or non-sisniflcant words. 

The second experimental tool available to the project is the 
NYSL manual card catalog containing the identical data. In this 
form the titles can be searched by a set of access points 
consisting of the initial words of the author, title or subject 
^ heading. Given these experimental tools and the technical 

jER|C requirements stated above the next step is to design an experimental 
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= search procedure which yields data on the utility of the remote 
access catalog. 

The general procedure follov/ed In the selection and setup of 
the data sample and documentation of searching was as follows. 
3.2 Preparation for Experiment 

The preparation for the experiment Included five tasks to 
prepare data and personnel for the actual test searching. 
These tasks v/ere : 

1. Train New York State Library and Inforonlcs personnel to 
be able to understand the capabilities of the Biomedical 
Communications System. 

2, Convert 18^000 i^eoords^ previously encoded from monographs 
and serials of the New York State Library shelf list into 
the MARC format, into the Internal operating format of 
the BCN system. 

3f Develop experimental procedures for performing searches and 
collecting and tabulating data. 

4. Select a group of test searoh requests from requests 
originated by libraries in the Mew York State Interllbrary 
loan (NYSILL) Network. 

5, Assign responsibilities for various segmented analytical 
and evaluation tasks to Inforonlcs personnel and the New 
York State Library personnel, 

fi*^; Develop tools for evaluating the effectiveness of the search 
results . 
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3.2.1 Training of pro^leQt personnel 

New York Stat© Library and Inforonlcs staff studied the 
Gapabllltles of the Biomedical CommunlGatlons DPS system from 
system manuals and by actual visitation to the BlomedlGal 
Communications Center at Upstate Medical In Syracuse. A small 
subset of the 18^000 encoded shelf list records had been converted 
to the Biomedical Internal format and by actual operation on 
this small sample the project personnel learned the command 
language of the BCN system. Additional NYSL staff mre taught to 
use the BCN system by project personnel who had acquired knowledge 
first hand by visits to the Syracuse BCN facility. 

There were two difficulties with learning to use the BCN 
query language, Pirst^ it is a general search system so the 
data base to be searched and printing options for successful 
matches had to be specified with every query* Thuswlth every 
query there was more than Just the search words to remember and 
to key* Secondly^ the search syntax was very rigid. Some 
words ajid symbols had to be used in a spe-.lfled way, "option" 
began each search* A semicolon ended each llne^ "end*^ ended 
each search* Furthermore ^ each word to be searched had to be 
Identified by the searcher as personal name^ corporate and 
conference name, or subject and title word. 

Neither of these problems were Insurmountable, but they did 
malce learning the BCN query language difficult. 
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: 3»2.2 Data base conversion 

There v;ere two steps to converting the data base to the 
Internal format used by the BCN-DPS system. The first step 
; was to determine which data fields and/or MARC subfields would 

i 

make the keyword indices for searching. In order to keep costs 
; down, NYSL had agreed to pay the rental for only one disk 

pack. Unfortunately, although the amount of disk system needed 
for the 18,000 record file could be predicted, there was no 
, means of predicting how many unique keyv/ords the various data 

fields would generate, nor how much disk storage the Indices 
( would need. Therefore, It was decided to Index a limited number 

Of fields 5 and as a result the disk pack was no where near 
i capacity. With better storage prejections more fields could 

, have been Indexed, 

The second step was the programming to convert the MARO 
j formatted data base* This was done by BCN personnel. The 

conversion of the data base was checked by displaying records 
In response to test queries. 
. When the conversion was deemed satisfactory the data 

base was converted. The indexes were created, and were used 
I to produce keyword frequency lists. The Indexed word had one 

of three prefixes dependins on the data field in which the 
\ word was found, '0' for personal name used as author, added 

entry, or subject added entry; '1' for corporate or conference 

1 
f 

name used as author, series, or added entry i and 'blank' for 
I title and subject words. The word frequency lists were In 

ErJc sections according to prefix. Each section was arranged 



alphabetically with the frequency of occurrence of the word In 
the data base and with the count of documents In which the word 
appeared. 

These printed Indexes were necessary when formulating 
a search. The user could determine If a v/ord was Indexed 
as he guessed, for example as sub,1eet or tltlej and if Indexed, 
In how many documents. This latter would help the user decide 
if he needed more Ind x terms to narrow the search, i.e., produce 
fev/er "hits". As helpful as these Indexes were however, they 
Implied an extra look-up and extra time before the computer 

search itself. It i^rould have been more helpful to have the compute 

do this lookup J and report If the word or combination of words was 

Indeed in the indexes and the frequency count . 

A master list of the complete data base would be needed 

for evaluation so the 18,000 MARC records were sorted by Dewey 

decimal number and printed,, 

3.2.3 Asaisnment o f analysis and eyaluatlon tasks 

The performance of the necessary searching, data eollectlng 
and evaluation tasks during the experiment posed some difficult 
scheduling and personnel assignment problems, which vfere 
further complicated by system failures and the early termination 
of text searching. The plan Involved the assignment of 
personnel for the various tasks and the speclfioatlon of 
procedure for an experimental work flow. 
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3.2.3.1 Personnel: 

There v?ere three principal participants involved in the 
on-line search experiment • 

1. A search coordinator who selected the requests, grouped 
the requests, evaluated the computer seo,rcheSj and when 
necessary re-searched the computer searches. Robert Vines 
was the search coordinator, 

2. A search assistant viho was under the direction of the search 
, coordinator, and who did most computer searches and some 

' comparative searches in the card catalog. 

j 3. A tabulator, who kept tally sheets of all requests ;ind computer 

searches, and kept the search coordinator Informed of what 
I types of searches had been done and which had not, Mary 

j Madden, of Inforonlcs, \ms the tabulator. Originally, the 

search coordinator would determine why searches failed and 
i how these searches should be re-searched successfully. When 

It became apparent that there was not enough time to re-search 
i _ requests, the coordinator sent them to the tabulator, whose 

task it became to determine why searches failed. 
3 . 2 , i| Selectlon_of Bearch requests 

The searches were grouped into five types or categories. In 
all cases 'A' group were those not found in the New York State 
library card catalog. 

a. Type I were presumed to be personal author and title requests. 

b. Type II and II-A were presumed to be Title Main Entry Requests. 

c. Type III and III-A were presumed to be corporate and/or 
ErJc Series Entry Requests. 
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Type IV and IV-A are synthetic I'equests v;hich v/ere derived 
from existing file records on the data base. 
Type V and V-A v/ere presumed to be subject searches. 
2 . 5 Procedure 

The procedure and experimental work plan consisted of 
e following steps: 

The search coordinator grouped all requests Into one of four 

categories (Type I, II, III, IV). 

The search coordinator numbered all requests. 

The search coordinator gave a group of requests to the search 

assistant. In the beginning requests were author-title 

only (Type I). 

The search assistant searched each request on the BCN-DPS 
system. He could re-search any request up to three times, 
assuming each time the search is a syntactically correct 
BCN-DPS search with no spelling errors. This was the original 
plan, time being of the utmost Importance forced the abandon 
of this, so that most searches had only one try. 
The search coordinator evaluated each search, and 
recorded his findings on the "search evaluation" sheet. 
Searches were divided into three groups: successfully 
GOifipletod searches, successful searches with too many hits, 
and unsuccessful searches. The search coordinator vma to 
record in a log book the status of each search. This was also 
not done, because of the time element. 
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Nothing further was done to successfully completed searches. 
Successful searches with too many hits were to be researched 
on the computer by the search coordinator to determine 
appropriate means of reducing the number of hits. This did 
not really prove a problem. 

Unsuccessful searches were sent to the tabulator^ to determine 
why they failed^ 

"Search Evaluation" sheets, hard-copy (from the terminal), and 
copies of NYSILL requests were sent to the tabulator, 
1, Unsuccessful searches ^ if there was a Dewey number 
on the request which was within the required range, were to 
be searched on the Dewey listing of the computer data base to 
determine if the title was in the data base. If the title 
vms in the data base, the manual search was to continue to 
ascertain why the title was not retrieved. In all but one 
case, the information from the Dev/ey listing, and the 
computer search were sufficient to determine why the search 
had failed. If the title v/as not in the data base^ the manual 
search was to continue to determine why the title was not in 
the data base. 

il. Unsuccessful searches with no Deirey number on the 
request were to be searched in the public catalog to determine 
if the title was indeed in the library, and If so, what the 
Dewey number was • If the title was not In the library the 
search ended. If the title was In the library but should 
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have been In the data base^ the search was to continue to 
aBCBPtaln If the tltl - was In the data base^ and why was It 
not retrlevedj or If It was not in the data base^ why not. 
The results of these searches were to be recorded on the 
"search evaluation" sheet* This Is fact was not done, but 
might be a good study to do. 
12, Once the routines have been established^ synthetic requests 
(Type IV) were given to the search assistant. These, time 
permitting^ were searched in the Public Catalog first, and 
then on the computer, 
13* Searches for retrieval item experimentation were to be done 

by the search coordinator. These were also to be done after the 
basic routines were established. These were not done 
due to the pressures of time. 
1^. Subject searches were also to be done, after the bulk of the 
other searches were done. There were however no NYSL subject 
requests to be found. 
3.2*6 Developme nt of evaluation methods 

Search effectiveness is defined as the percentage of requests 
matched by entries in the data base which represents titles 
actually desired. The numerical value of search effectiveness 
varies with difficulty of search which in turn depends on type 
of rogues t, search logic and vocabulary used^ etc. 



3.2,6.1 Search evaluation sheet 

The test search data were recorded on a Search Evaluation 
Sheet (Figure i ) In order to make subsequent analysis and 
tabulation more convenient and to determine If the search was 
■successful. A description of the Search Evaluation Sheet follows 
along with experience gained in using It. In Its mode of use 
the tabulator (Madden) \muia be able to evaluate the searches 
without reference to the majority of NYSILL requests on the 
machine file. Due to the pressure of time, the forms were not 
always filled out completely] however the designated data fields 
still determined which items to evaluate a search by. 
Item 1^ - "Type" 

, The searches were divided into four categories or types so 
that in addition to measuring the effectiveness of the total 
sample of searches , sub totals could be determined by' 

1. author title reqiiests from NYSILL "I" 

2. title main entry requests from NYSILL "II" 

3. corporate main entry requests from NYSILL "III" 
^. searches of all types not from NYSILL but 

simulated from entries known to be In the data 
base "IV" 



Item numbers rsfei' to figure 1. 
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^ >^ 3,7>-, /./.-^ ^ ^^^^ ;2^__ 2:Jj^ 

@ SEARCH* i m ~] ^pDATE | ^ j / J 

I 

(^^MONOGRAPH? ____ SERIM.? ____ »IOWN ITEM ? SUBJECT? 

^ ©Nlil^ER OF SUCCESSFUL MACfilNE SEARCHES 

I ^NUMBER OF UNSUCCESSFUL MACHINE SEARCHES 



{ ^RECORD IN MAffllNE FILE YES? ^ NO? 

RECORD IN CARD FILE (P.C) VBS? NO? 



^ r.tff^DRbj^7rQN pre;^ent _on ^^^^ 



j /^S t^'E^D? PrES%*T ''m 'T^ t^^y ' RECQR^ 



REqjEST ACCESS POINTS EXaOTL^I LIKE MACHINE RECORD, YES? NO? 
If no, how do they differ? 



SPELLXNGI . 
COfCPLETENESS: " 
WORD POSITION: _ 
OmER DIFFERENCES; 



EXPLANATION; 



ErJc PlfSure 1 
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A fifth type "Subject" was provided for but not used because 
of lack of time 

In addition each of the searches was catagorlzed according to 
whether or not it was found in the public catalog, an "A" being 
used to designate such If it was. 
Item 2-Search flumber 

This Item was devised to aid correlating NYSILL requests 
and computer searches. At first one computer search represented 
one NYSILL request, later several requests were Included In one 
computer search. As it turned out the Search Evaluation Sheet 
became a cover sheet for a set of accompanying console printout 
sheets. 
Item 3-Date 

Self explanatory, however It was found useful to include 
the time of day of log on-log off of the search. 
Item 4-Search Category 

Monograph, Serial, Known Item, and Search categories were 
used to record other attributes of a request in addition to Item 1. 



« The BCN system was shut down during the course of the experiment 
and summer vacations limited the labor which could be spent by 
the searcher. The labor v;as deemed better spent in getting more 
searches done rather than transcribing data about them on the 
sheet. This of course shifted an unanticipated burden on the 
evaluator (contractor) because most data had to be gathered from 
the original, documents. 



Item 5-Number of successful machine searches 
Item 6-Number of unsuGoessful machine searches 

These items recorded the number of successful and 
unsuccessful distinct searches accumulated for the requests* These 
data give an Indlcatlonj before careful review of the runsheetg 
as to hov; well the computer Bearch fared. 
Item 7-Record in machine file 
Item. 8--Entry record in card file 

This data was collected only for unsuccessful searches 
because if the search succeeded the record was in both files. 
These two questions provided data for the evaluation and follow up 
of unsuccessful searches* 
Item 9*^Information present on request , 

The entry on the NYSILL request was to be transcribed onto 
this form. In most Instances the NYSILL request itself vms 
attached to the evaluation sheet* 
Item 10-Ilarc Field present on machine record. 

The MARC elements on each machine record matched was also 
to be filled in by the search coordinator when he verified the 
search results. 

Item 11-MARC fields present on shelf list record. 

This data was needed for verification .purposes . Also^ 
It revealed which records that should have been in the data file 
were not, A public catalog search would help determine why the 
record was not in the data base* Was it: not in 550-599 range^ 
new acquisition J member of lost batchy etc. 
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Item 12-Request access points eKactly' like machine record. 

This question was designed to record the variation that 
was found to exist between the request words and the machine 
record words* It was thought to be one of the most important 
items of experimental data to be collected. The design character 
of operational systems anticipates that matching must take place 
with incomplete or Inaccurate data. The nature and possible kinds 
of inaccuracies which might occur must be known. 
Item 13--Explanatlon* 

This item was included for noting any problems or special 
conditions and was mostly used for elaborating differences In 
search words and machine search words* 
3»2*6.2 Tally Sheet 

The Tally Sheet was used to summarise the data recorded on 
the search evaluation sheets ^ and to record additional information 
about each request. The columns on the Tally Sheet included the 
following items : 

1, Request search Identification * 

The request searches were Identified by both the request 
number from the NYSILL request^ and the log in-log out time for 
the search* If several requests were Included in one computer 
search the total time for the entire search is shown divided by 
th© number of searches to Indicate an average time for each search. 

2* Date of computer search . 



I 

as/ 

3* Total searches for this request, 
j This column recorded the number of times a search was made 

^ for a single request. The experiment plan tos to search each 

a maximum of three times If It was unsuccessful the first 
I and second time. However of the 645 searches made^ only 60 were 

second searches^ and only 16 were third searches. 
4. Monograph J Serial ^ Known Item^ Subject. 

This field was used to tally the request categories recorded 
on the search evaluation sheet. 
5* Request. 

This column was used to tally the type of request data 
appearing on the search evaluation sheet. 

6. Record in Machine flle^ Record in Public Catalog, 

This data was needed to determine whether an unsuccessful 
search v/as due to the fact that the title was not in the libraary 
(Public Catalog) or had not been encoded Into the machine files. 
In many cases the fact that a search was successful^ was sufficient 
evidence to indicate »yes* for both questions regardless of 
what the search evaluation sheet said. Type III-A and type II-A 
requests^ which did not have Dewey numbers listed were unverlfiable 
without access to the Public Catalog and- therefore on the NYSILL 
request were not tallied on these questions* 

7* Information present on request. 

These columns record i^rhlch data elements were on the NYSILL 
Request . 
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8. Fields present on the machine record. 

These columns record the data fields present on the 
bibliographic record. The Dewey list was the source of this 
information. 

9. Request access points = machine access points 

If no, how do they differ? This column summarizes the data 
on the separate search evaluation sheets, 

10, "Request accesB points searched" 

The field recorded which data fields aiid how many words of 
these fields ymre used in the computer search. Also, noted 
here was the use of the special search features of the BCN 
system. This Information was taken from the computer hard copy. 

11. Total number of hits. 

The number of documents which matched this search are 
recorded in this column. In a multiple request search, the total 
hits for all requests were not recorded In this column. Only 
the number of hits that matched the search terms for the single 
requeot were tallied. 

12, Success 

Success means that the desired Item was easily identified 
among the total number of items matched. No analysis of the 
ratio of relevant to non-relevant matches was made. 

13. Reason for failure. 

In the event that the search was unsuccessful, the tabulator 
had to determine why. The possible reasons for failure are .• 
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typing error (on the computer) 
not on the data base 
search strategy 

operator, fumble (other than typing) ^ e.g,^ 

incorrectly formulated search ^ omission of 

^ list * statement 

terminal malfunction 

system malfunction 

request Information misleading 

maohlne record incorrect 

untracable (type III-A add II-A) 

should have been retrieved? (The search 

was systactlcally correct and the record 

was on the data base^ but the search failed.) 
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3-3 Test Search and Evaluation 

The search experinentatlon was carried out using the personnel, 
pronodures, and forms Just described and yielded good experimental 
results. The DON systen was understood to be an experimental tool 
only, and Its short comlniss while they aguravated the experimenter 
somev/hat, ' did not seriously affect the data collected. 
3.3.1 CSSclusl^ons ab out s e ar ch f e ctlveness 

Table 3.2.S Is a display of the total results of successful 
and unsuccessful searches, and totals of errors which caused the 
unsuccessful searches. These figures represent the search effective- 
ness which could be expectsd if the BCW System were put Into operation 
searchlns WYSILL requests uslnfj the current experimental procedure. 

At first slance It might seem that the use of the on=llne 
BCN system was a failure since the majority (52.5,1) of the searches 
were unsuccessful. See lines 2 and 3. However this Is not really 
the case because many unsuccessful searches were due to failures 
which would be corrected in an operational system. The followlnLS 
analysis of the data, points out such errors and deacrlbea briefly 
what can be done about them along wltn specific tasks for further 
experliiientatlon. 
3.3.1.1 Untraceable searches 

The WYSILL requests, which were not found In the public 
cataloe, became "A" requests for the on-line experiment. Wone 
of these "A" requests had a Dewey Decimal number on the IJYSILL 
requests, and therefore it was impossible to trace them accurately 
in the Dewey ordered listing of the data base to see why they 
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failed. The search tabulator using Dewey Decimal Classification 
classified the requests in an attenpt to trace these records to 
determine why the search failed. Twenty-nine of the 36 requcsUr. 
could be classified nearly unambiguously but even so they were 
not found in the data base listlne even through a scan of several 
Dewey classes bounding the classified iIYSILL request. 

1/e conclude therefore these 86 untraceable searches are due 
almost entirely to the machine record^ not being on the data 
base^ and have aBslgnea them to that error category, 
3.3*1-2 Request record not on the data base 

If the request record was not successfully retrieved^ and 
If the Dewey number on the NYSILL request was not found on the 
Dewey listing of the data base^ then the record was assumed not 
to be on the data base. Instances ■ did arise when the Devjey 
number on the NYSILL request vias InGOrrect^ but these records 
were retrieved^ or another record vjith the given Dewey number 
was listed In the Dewey Decimal listing of the data base. In 
this latter case^ two records with the same Dewey Decimal number^ 
the search tabulator had no means of determining which record 
had the Incorrect Dewey number^ so It was assumed the rJYSILL 
request v/as correct* 

The number of searches for which records are not in the file 
is 155 combining the untraceable and those known not to be on 
the file. In an operational system all the library's holdings 
will be on the data base* The computer v/111 not be asked to 
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retrieve rrhat is nbt ih the data uaso, unless for an acquisition 
support systeri. TliUs the percentase of unsucceaaful searches due 
to records -not on the aata base will not be as high as In this 
experiment, unless the data base is not kept up-to-date. 

The new measure of effectiveness Bxcludlng these igS searches 
Is 62.5^ successful atia 37. 5^ unBucceaBful, 
3. 3. 1.3 Operatot» erfOi-s 

There were three types of oper'atqr errors - query typing 
errors, search syntax errors, and search strategy errors. 

3.3.1.3.1 Query typing error's 
As the search aasistant types a search in the BOW Doc. Proc. 

System command language^ there is a chance he will make a typins 
error. , This was mor© llKeiy vo happen at the beglntilng of the 
actual test searclUna and continued until the operator became 
more skilled. The nUmDer of failures due to operator typing was 
37 or 5. Op;. 

3.3.1.3.2 Search syntaJ^ errors 
As the search assiatant formulates and types the search, 

thei'e is a ohance he wOUid moke an error In the required syntax 
of the search. For example, he might forget to ask that the ' 
r aeavch roaulta he listert, or he might forget to specify the data 

base to be searohea. O-hg number of failures due to Improper 
j search syntax was 36 or 5.6 jl. 

^ 3.3.1,3.3 Search stratefSy errors 

J The Bearch assistant was not Instructed as to the meanlns 

I of type 1, II, H-A, etc., designations. Given a eroup of requests 

ERjC ° search, he made hla own inference as to what type of citation 
each request was, and sUbsenuenti v h.isprt hi « =4.«^.«^.. ... 
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this InferonGe. In sone instances v/hon the searcii assistant's 
inferenca v/as incorrect ^ the subsequent search strategy could 
not possibly result in a successful search. In other Instances^ 
incorrect Inferences made no difference. The number of such 
failures was 33 or 5.2%, 

3#3^1»3*^ Effect of operator errors on search accuracy 

The total number of operator errors were 106 which caused 
a total of IS, 6^ of the searches to be In error. In an operational 
system several factors would exist which would prevent such errors 
and would correct those which were initially present so that they 
would be eliminated by the time the caarch was completed. 

First In an operational system the terminal query language 
would be tailored to catalog searching so that it would be a 
simple console process not requiring ta^ memorizing of a series 
of complex commands nor the typing of such search constants as 
database name. This simpler languafje would ellmlu^te a large number 
of typing and search specification errors. 

Secondly an operational syGtem wmjlfl be staffed by 'ibrary 
personnel skilled In the reference and search function^ v/ho 
would become^ after training^ thoroughly familiar with the basj. 
operation^ and would make fev/er conceptual errors. 

Thirdly, a system designed epeclfically for search would 
have tttrfninal .langnftf^o rilri|;nor^tlcs and feedback error meosages 
and coiiimenta oo that logical errors which enter the system could 
be presented to the searcher for correction* Also In an operational 
system the correct portions of a modified query would not have to 
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Finally the search speed of an operational syster.i v/ould 
be faster, so that searclies which contained errors and produced 
Ineffective results could be searched again v/lth minimum 
expenditure of time and effort, 
3.3.1.^( Equipment errors 

The equlprient failed during the testinfj due to both soft- 
ware and hardware. For purposen of the experiment the errors 
were categorized into terminal failures and system failures, 
3.3.1.^.1 Terminal failures 

These failures were primarily due to the fact that the 
terminal used vms a light duty one, and appeared never to have been 
heavily enough used to v/ork out its mechanical difficulties. 
Under the initial spiirt of heavy use In the experimentation, 
failures occurred vmich were then fixed. There v/ere 17 failures 
for a percentage of 2.7^. 
3. 3.1. 'I. a System failures 

The softv/are and hardware of the BCM system failed 36 
tines for a rate of 5.6-$. The combined equipment failure rate 
was 8,3$ v^hlch in an operation system could be overcome by 
duplication of consoles, and repetitive searching. In all 
commercial time sharing systems, the dovmtlme is less than 1% 
and the mean downtime Interval Is approximately ten minutes. 
Also most systems have a "fall-safe" feature so that most 
search queries and partial results are saved and a search can 
continue from where it left off. 

If such procedures can be established to eliminate these 
Q failures, then the search effectiveness Is 93% successful and 
^75? unsuccessful. 
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j 3.3.1.5 

The remaining three reasons for unsuccessful saarches can 

i 

! not be solved by present known developments. In an operational 

; system, these are basic limitations and v/lll oontributt to error. 

3.3.1.5.1. Hachlne record Incorrect 
I These failures ars a result of the input system used. 

They should be analyzed further to determine if there is any 
i pattern amongst machine record errors. For example, are they 

f tagging errors, or keying errors. Are keying errors most likely 

when Inputing numbers? At this point in the experiment these 
j errors do effect the success rate, but in any future system for 

the New York State Library machine records errors can be held to 

■1 

I a minimum, 

; 3.3.1.5.2 Requests information misleading 

This was a particularly perplexine problem amongst requesta 
! that appeared to be author title requests, but which turned out 

to be requests for a member of a monograph series. Further 
' work should be done to design a search strategy which will 

^ rosolve many of these ambiguities automatically. Users should 

also be encourased to re-search requests, trying as many different 

access points as possible, 

3 .3. 1.5. 3 Should have been retrieved. 

It was Impossible to determine why one particular search 
failed. The record was on the data base, the search was typed 
correctly, the search was formulated correctly, and the terminal 
seemed to be working properly. This we would label unassignable 
rn^^- ''^^^^^ it viQuia appear would occur in an oparatlonal system 

BMaMB as it did in the experiment. 



3.3*2 Characteristics of reiuests and matohes In the 
on-line search experiment 
During the experlnient several data Items v/ere recorded to 
discover hovi particular classes of requests matched. These 
data are compiled and tabulated in the following paragraphs 
under the following topics r 

a. The distribution of matches per succeasful request 
search 

b. Differences between request record and rnachlne 
catalog record 

c* The distribution of the number of words used per 
search 

d# Data element usage 
3.3#2*1 The Distribution of matches per sueoessful search 

Table 3*2.7*1 Is a display of the number of documents 
retrieved" per successful search. The majority (7^1^) of the 
searches retrieved only the desired document ^ 171 of the cases ■ 
retrieved the desired document and one additional document^ and 
5,3^ of the searching cases retrieved the desired document and 
two others , 

In preparation for searching. It was decided five v/as a 
tolerable number of hits^ although only one of the five was 
the desired Item. These results show if the search was successful 
then It was very succesBful In terms of retrieving less than the 
allowed five documents* 
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One must remember all cases were not successful, and the 
above results may be only a por^don of the picture. The unsuccess- 
ful searches which retrieved docunents could also be tallied^ to 
ascertain if the majority of all searches retrieve only one 
document 5 and if there is not a greater range in the number of 
documents retrieved. Also^ this number of matches la dependent on 
file size so that if additional records were included^ additional 
non-relevant matches v;ould be made, 

3-3.2.2 Request record v* machine record . 

The statistics on Table 3.2,7.2 show that differences did 
exist between request records and machine records. There were 
differences in 47^81$ of all searches and there were no differences 
in 22,971 of all searches ^ This finding corroborates v;hat we 
already know^ nwmely that fevf library patrons will know the exact 
catalog entry. Any query system designed for on-line searching 
without the book in hand will need to bear this in mind. 

It was not determined if the request record differed from 
the machine record In 29.22^ of the searches. It was assumed 
that these cases would later be re-searched* These requests should 
be re-searched until It can be determined v/hecher or not the 
request record Is the same as the machine record. 

The type IV requests had a high percentage of different 
entr'es (38. 63^) and a high percentage of Incompleteness (61.361)3 
because they were constructed to be misleading and often leading 
words, were omitted from the request* 
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Type I records had a high percentage of Incomplete records 
(or entries) 61.32^, because insiLL requests often include only 
the author's Initials, instead of the full name. This did not 
hinder retrieval. The surname and initials. In most cases, 
uniquely Identify the author. 

3.3.2.3 The Distribution of the Words used per search 

The majority (88%) of all searches used 1, 2, or 3 words 
in a .aarrh. search words was an error In search syntax 

on the part of the searcher. 231 used 1 word. 33^ (the largest 
Sroup) used 2 words, and 32% used 3 words. 

The range of words per search was H for types III and IIIA, 
5 for types I, 11, ha, and 8 for type IV. Type IV, the synthetic 
searches, tended to use more words probably because the searcher 
was not sure what he was searching. 

Wo particular number of words per search was used In a 
majority of cases. 

In conclusion, no matter what type of request record is 
being searched, the search system must allow for multiple word 
searches. Search algorithms should be based on from one to three 
woiflB nf tiie search request. 
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3.3.3 Fercentage Data Elements 

Table 3.2. 7. ij entitled "Percentage of Each Data Elements 
Use In the On-Llne Searches" Is a tabulation of the MARC Glements 
used In the searches along v/lth a tabulation of the use of the 
BCW-DPS special search features, The ilARC elements that v;ere used 
either alone or In combination with other ilARC elements are • 
main entry, title, serial indicator, series, subject, general 
note, and dissertation note. The BCN-DPS transaction search 
allows one to spell the beginning of a word and punctuated with a 
special symbol ($). This is useful In Bearchlng for inflected 
forms. For example, one can search alternate, alteration, 
alternatives by typing alter($). A second Bl oniGfllnal acai-uii roatui 
that was used is the scan feature v/hereby one may search data 
fields that were not automatically indexed by the BCri.DPS system. 
Thus one may search the general note field for a report number, or 
the dissertation note fields for Indication that the document Is 
Indeed a dissertation. 

As the table indloates the frequency of the use of MARO 
elements depended very much on the type of request being searched. 
The search assistant would Infer from the NYSIL request that 
he had a personal author citation, corporate author citation, etc. 
Peraonal author request (Type 1) was searched under main entry 
and title fields in 76.71% of the searches. Title main entry 
(Type H) was searched under title in 71.251 of the searches. 
Synthetic requests (Type IV) required the greatest diversity of 



r^ARC data element categories. This is largely because the 
synthetic requests were assimilated from a wider range of data 
fields than are liable to be In the HYSIL requests. 

In general fixed field inforrnatlon vms not used In any of 
. the searches. The serial Indicator Is the exception to the 
rule and It was used in 6.8S of all searches. It shoulfl bo kept 
in mind that there v/ere not many serial entries included In 
the data base, 1000 of the 18,000 records were serials. There- 
fore successful searches of serial records were not as numerous 
as non-serial searches. 

The BCN truncation feature was used in 1.7,1 of all searches. 
The scan feature was used in 1.001 of all searches. It should 
be noted that the scan feature was only used on data fields 
non indexed automatically by BCN-DPS. It Is costly to use the 
scan feature, as It searches character by eharacter. Therefore, 
it is wise to avoid the scan feature v/henever there are other 
possible search access points available. Hov/ever, the utility 
of such a feature and the truncation feature should be kept In mind. 

It should be noted that many of the MARC data fields were 
not used in the searches. Some of these data fields are: Imprint, 
collation, edition, government printing office number, SBN number, 
etc. Many of these, fields are either Incomplete or omitted in 
the NYSIL request so that the search assistant might not have had 
them in front of him v/hen he made his request and therefore did 
not use them. 
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Section 3.3*^ Synthetic searches 

Synthetic search requests were so named Tor their mctiujd 
of development. Catalog citations within the given Deivey 
classification range were rev/rltten as citations v/lthout the 
required NYSLL Information, In other vrords they were rev/rltten 
as vague reference questions first encountered at the reference 
desk before the search refinement given NYSLL requests, A total 
of 74 synthetic search requests were searched by the Science 
and Technology Division In the catalog^ but only 50 were 
searched by the search assistant in the machine data base so all 
statistlca refer to the 50 searched in both flies. 

Synthetic Search Results 
Pound in both 19 38^ 

Pound only in card cat. 3 6% 

Pound only In machine file 19 38% 
Neither 9 18^ 

Overall the computer search fared much better than the human search. 
This Is true despite the fact that these vr ;ue synthetic searches were 
searched by personnel from the Science and Technology divlRlons^ 
who would be most familiar with the subject area. No record 
was kept of how long each catalog card search lasted^ or of 
how many false starts were made before some ansv/er^ the correct 
title or no tltle^ discovered. This information is available 
for the machine searches 1 only two of which were searched more 
than once 3 and one of these was not necessary* It is very likely 
that the card catalog searched v/ere tried more than once. 
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It is a very rare and valuable searcher who Gan sol-^e reference 
questions with one try. On the other hand, the manhlne techniques 
which although not well suited to library problems still had a 
success rate of only one search of over three times that of 
the (experienced) human searcher. 

The card catalog was successful in 3 searches (6%) In 
which the machine failed. It is interestlns to note why the machine 
search failed. In the first case, the search assistant "and-ed" 
three title words. Unfortunately one of these words was in fact 
not in the title, so the search failed due to an operator error. 
The human searcher probably knew the anacronyrn Included in the 
title, and so had little trouble in the card catalog. In the 
second case, the search assistant searched on more of the series 
entry than was indexed. This points up an Improvement for future 
systems: all series entries should be fully Indexed. In the 
third and last case, the author* s surname was searched in the 
general notes and not in the contents notes, as It should have 
been. This Is an operator error. If he knew the surname was not 
the main entry, but a note then he should have guessed that it 
was a content note not a general note. Thus there irere two operator 
errors and one data base limitation failure. Is is very likely 
both operator errors would have been caught on a second search. 
Improved indexing could be obtained through the use of a larger 
file so that the entries could be found by query using any access 
word. 
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In sum these three cases do not really prove *-he card 
catalos superior to the machine search strategies or flles^ bi 
rather shov; problems that must be provided for In an on-going 
library search system. The machine searches v/ere far more 
successful on only one. search than the card cataloo searchea . 
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^, THE USE OP MODIFIED rmnc RECORDS IN TH-- 
NYSL EXPERirmNTAL DATA BASE 
The design of the conversion experiment provided for two 
types of encoded recor.da. One a complete catalog record encoded 
using a complete set of imRC II tags, and a second complete 
oataloE record enooded uslns an abbreviated set of tags. The 
Iritent of experiment with an encoded file using abbreviated 
tagging was to determine whether costs would be saved convertins 
to such a file, and if so whether searching effectiveness would 
be Impaired using abbreviated tagglnE. In the conversion 
experiment, two tasks were carried out to create these files. 
Task I created fully tagged records and Task II created fully 
tagged records and Task II created records with abbreviated tagging 
^ • 1 Speclflo atlon of Task II Tagglns Elements 

The file to be created in Task II differed from fully tagged 
MARC II records In two general ways, one the record format and 
character set were different, and two the data elements included 
In the record were different. 

'^•^•^ Differeno es In reoord format and character usage 

The following list describes differences of the NYSL 
Task II record from the standard L.C. mRC reoord (Task I record) 

1. Task II records use the character set specified in Table 1. 
No lower case alphabetic oharaoters are used. 

2. The character la used as the subfield delimiter, is 
used as the end of field mark, the two characters "=S" ere 
used as the end of record mark. 

3. Accent and diacritical marks are not used. 
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1*2 DlfferenceB in data elements In the Task XX records 
(abbreviated tag^glng) . 
All Indicators are blank except In fields 

2nd Indicator 



Field 


let Indicator 


OiJl 


0 or 1 


100 




110 


111 




260 


0 or 1 


400 


if 


HlQ 


411 




505 


0, 1 or 2 



0 'or 1 
0 or 1 
0 or 1 
16 

0 or 1, 
0 or 1 ' 
0 or 1 
16 

The following fields are never present- 

040 071 
051 241 
060 350 
070 

The follov/ing aubfields never occur: 

050 subfield "'B" 

300 subfield "B" and "C". 

In field OO8 

Character 22, Intellectual code must always be M. 

Character 30, Pestchrift Indicator must always be 0. 

Charaoter 31» Index indicator must always be 0 , 

Character 32, Main Entry liidloator must a.1ways be 0. 

Character 33, Plctlon Indicator must alwaya be 0. 

Character 38, riodlfled record indicator must always be 0. 

Character 39, Cataloging code must always be M. " 

Character 22 of the record leader Is the completeness Indicator 
LC does not use this Indicator. 

I-record does not contain all bibliographic information. 
C-record has complete bibliographic information. 

Character 23 of the record leader is Task Indicator. LO 
does not use this Indicator, 



1- Task 1 

2- rask 2 



Data field 001 contains the NYSL Accession number, not the 
LC-Card Number. This field la always 13 characters (Includin 
enu of field mark) right justified ajid blank filled. 

Data field 010 contalnB the LC-card number, LC does not use 
this field. $a is the only subfleld code. The indicators 
are not used. 

Data field 082 contains NYSL-Call iMumber, not the LC-Dewey 
Decimal number. $a is the only subfleld code and can occur 
only once. The indicators are not used. 

Data field 901 is the NYSL-IIoldlnss , LC does not use this 
field, 

$a - delimits the start of the data. 

$^ - separates each accession number from the holding 

Information, 
The indicators are not used, for example: 

W$Alil06700$Mc.l,¥l^o678l$MC,2,WiJ06r82$]^C, 35 

Data field 902 Is the NYSL-Batch control field, and Is 
always present. LC does not use this field. 
$a - is the only subfleld code. 

The indicators are blank. The data is a 2 digit Batch 
number, a 1 character Cataloging code ("A", or "C") and 
an optional Inventory code (if present it is "M" , i.e. 
missing in last inventory). ■ 

Field 008, character 28, the Government Publication 
Indicator, is not like LC MAnc. It contains: 

M not a governmsnt document 

P federal 

L local 

S s t at e 

A foreign 

U intematlon 

Character 7 of the leader, the Bibliographic level is not 
like LC riARC. It contains: 

M monograph 

S serial, with complete .holdings listed 
P serials with Incomplete holdings listed 
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^•2 Experimental Results 

The results of the conversion experiment showed that there 
v/as no dlscernable difference In the cost of encodinn either 
the Task- I or Task II records. Although It would seem that 
there would be savings using the abbreviated tasglng, the 
following observations were made which support the finding: 
^'Z,2 Njjmber of cha racters per record nearly equal f or bot h 
records 

The number of characters per record saved by abbreviated 
tagging was Insignificant, Therefore all keyboardlng, proof- 
reading and data file correction costs were equivalent. 
• 2 . 3 Editing and tamiriF, difficulties not related to 
ab b re vl at e d_t agRi 
The last remaining cost element, editing' and tagging as it 
turned out, is not reduood by an abtarevlafed set of tags. The 
difficult tagging decisions (those which contribute excessively 
to cost) are of a cataloglnG nature and exist whether or not an 
nhin'ovJatod set of tags are used. 



NYSL MARC TAPE CHARACTER SET 



Tape; 7 tract odd parity Boo bpl 





Sh 




1^ 




Fh 




t^ 




<U 




o> . 




OJ 




0) 




■P 




-p 




+a 




4J 




0 




0 




0 








H 


(d 


H 


cd 






£d 


M 




M 


CD 


t , 
}h 










-p 


to 










0 




u 




t) 




a 








0 


0 




u 






00 


0 


20 


+ 


40 




60 


¥ blank 


01 


■1 


21 


A 


ill 


J 


61 


/ 


02 


2 


22 


B • 


i(2 


K 


62 


S 


63 


3 


23 


C 


^3 


L 


63 


T 


OH 




2i| 


D 




M 


ek 


U 


05 


5 


25 


E 


il5 


N 


65 


V 


06 


6 


26 


P 


ll6 


0 


66 


W 


07 


7 


27 


Q 

M 


t f 


P 




V ■ 

A 


10 


8 


30 


H 


50 


Q 


70 


Y 


11 


9 


31 


I 


51 


R 


71 


fi ^ 


12 




32 


< 


52 


1 


72 


] 


13 




33 


• 


53 


$ 


73 


J 


in 




3^ 


) 


5^ 




74 


( 


15 




35 


It 


55 


# 


75 




16 




36 


i 


56 


& 


7fi 


w 


17 


C 


37 


* 

1 


57 


> 


77 


9 



Note I 

15 apostrophe 

75 underscore 

76 triple hyphen 
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5. CONCLUSIONS AND SUGQESTIONS FOR FUTURE RE^T.ARCH ' 
Prom the experimental work performed, we conclud!? that 
. converting the MYSL shelf list sample to macl'.lne readable form, and 
searching this shelf list using a Remote Access Catalog are 
technically sound concepts'. The conversion procedure used Is 
workable and would be practical In a production environment with 
a few technical modifications. A second finding Is that ' the 
search methods used In the experlmant are satisfactory. The 
search "power" was found adequate using the experimental system 
and although almi , by analysis one could design a system which 
can search faster and more conveniently than manual searching of ^ 
card files. 

5.1 • Cost of a Remote Access Catalog 

Although the experiments showed that the projected conversion 
and search concepts supporting a Remote AccesB Catalog to be 
technically satisfactory, the capital costs of data conversion 
ajid system Installation will be substantial. 

Data conversion costs were calculated to be $1,711 per entry so 
that $ljll3^600 is required to convert the filJO.OOO entries in the 
shelf list. By comparison v^ith other systems, the conversion 
costs were shown to Increase with an increase In quality. In 
order to reduce costs, any projected NYSL conversion should 
make the most of retrospQctive conversion performed by L.O. and 
others. Per record costs using available records are estimated 
to cost $.26. Although no data are available from which one 
could calculate a percentase of one's holdings available in other 
libraries, there Is a potential cost reduction of 851, 
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No data are available at this time from which one could 
project savings using format recosnltlon. Prom the experimental 
work performed on enoodlng rtcordB with abbreviated tagging 
system, we conclude that the cost savings are insisnlfloant , 
Within the accuracy of the experiment discernible cost savings 
could be verified. Further, If there were minimal savings they 
have to be weighed against the disadvantage of not creating 
standard records which could be shared with other libraries, a 
concept that appears to offer greater data base encoding savings. 
5.2 Future Research 

During the course of the experimentation, problems occurred, 
questions arose, and new ideas ocourred to the Investlgatprs 
which could form the basis for further research. V/e have 
divided these suggestions for future research into three categories 

a. Investigation of new or modified concepts in oonversion 
and search, 

b. Specification of Improvements needed for a produotlon 
system, 

c. The specification of new uses for a remote acoess catalog. 
5.2.1 £avaifcllMloa-OX^new,conve^r^^ search con cepts 
5.2.1,1 The use of on-line prompting terminals to convert data 

An alternative to complete tagging of riARO records Is to 
have the typist encode data by following the queries of a 
oomputor program connected to the terminal. Such a procedure 
holds the prospect of aubstantlally reducing the amount of 
tagging required. 
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5.2.1.2 Search of MYSL sample in other data bases 

It would be useful to searoh a sample of NYSL entries in 
the libraries i^hlch have encoded their retrospective collection 
in MARC II format. This overlap analysis would provide information 
needed In the calculation of the costs saved by using the HARC 
records of other libraries. 

5.2.1.3 Investigation of the loss of utility of a catalog 
Containing a certain percentage of eiTor. 

There is an Indication that errors in a catalog do not impair 
its usefulness as much as its esthetics* A study of this problem 
with the objective of determining an acceptable error rate would 
be useful 5 and perhaps point the v/ay to lower cost conversion of 
an acceptable quality. 

5.2*1^4 Investigation of additional search logic capabilities 

There are problems of searching where the truncation feature 
is nob powerful enough ^ such as text book and textbook* In 
addition it is thought that positional description of terms (A 
followed by B) would be useful* An analytical Investigation of 
these and other complex search problems such as corporate author 
would be useful and necessary before any large committment was 
made to a future system. 

5, 2. 1,5 Study of search system response time characteristics 

In the experiment 5 waiting for searches ^ althp^ugh It 
was understood as a limitation of the experiment at> ihe outset, 
was bothersorrif? , flowever little or no data was collected which 
would tell one what an optimum response time was* Also It Is not 
known whether a complete search is needed rapidly or merely the 



71. 



search to the first Item. which can be displayed. In the latter 
approach, the computer system would use the reading time taken 
by the user to perform additional searohes, and would relieve 
peak loads. An analysis of this search response time requirement 
would be useful In determining equipment conflsuratlon and cost 
characteristics of any Remote Access Catalos' Configuration. 
5.2,1.6 Search system respor s time 

One result of 'the search experiment pointed out that there 
may be two classes of searches, one of which Is bothersome to 
ivalt for, the other not. When the result of a search is unpredictable 
or where feedback Is necessary to reformat the search then rapid 
system response la necessary. Searohes which are predictable such 
as Interllbrary loan request for monographs sparched by personal 
author main entry do not need a rapid response because the outcome 
Is definite and depending on match or no match the next step Is 
known. We think this is an area worthy of further study. 
^ * ^ • ^ Speolflcatlons of Improvementa needed in a production 
system 

The present experiment experienced a great deal of difficulty 
In document control. It would be useful to specify a control 
system consisting of both manual procedurea and computer tabula- 
tion ivhlch would Insure that production flowed smoothly with no 
lost documents or dupUoated conversions. 
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5*2.3 Study of ad ditional use of the remote aocess oatalo^ 

During the course of the experiment several questions 
arose concerning the use of the data base for othsr than the 
ReniQte Aocess Catalog and the effect of those uses on the 
requirements of the data base. In addition, the large cost of 
the data base suggesta that it be used for other purposes. It 
seems appropriate therefore to study the possible uses of such a 
base and document them Into a set of general system requirements. 
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